Search CORE

9 research outputs found

A Graph-structured Dataset for Wikipedia Research

Author: Aspert Nicolas
Miz Volodymyr
Ricaud Benjamin
Vandergheynst Pierre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 20/03/2019
Field of study

Wikipedia is a rich and invaluable source of information. Its central place on the Web makes it a particularly interesting object of study for scientists. Researchers from different domains used various complex datasets related to Wikipedia to study language, social behavior, knowledge organization, and network theory. While being a scientific treasure, the large size of the dataset hinders pre-processing and may be a challenging obstacle for potential new studies. This issue is particularly acute in scientific domains where researchers may not be technically and data processing savvy. On one hand, the size of Wikipedia dumps is large. It makes the parsing and extraction of relevant information cumbersome. On the other hand, the API is straightforward to use but restricted to a relatively small number of requests. The middle ground is at the mesoscopic scale when researchers need a subset of Wikipedia ranging from thousands to hundreds of thousands of pages but there exists no efficient solution at this scale. In this work, we propose an efficient data structure to make requests and access subnetworks of Wikipedia pages and categories. We provide convenient tools for accessing and filtering viewership statistics or "pagecounts" of Wikipedia web pages. The dataset organization leverages principles of graph databases that allows rapid and intuitive access to subgraphs of Wikipedia articles and categories. The dataset and deployment guidelines are available on the LTS2 website \url{https://lts2.epfl.ch/Datasets/Wikipedia/}

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Anomaly detection in the dynamics of web and social networks

Author: Benzi Kirell
Miz Volodymyr
Ricaud Benjamin
Vandergheynst Pierre
Publication venue
Publication date: 01/01/2019
Field of study

In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with a good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm. To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets.Comment: The Web Conference 2019, 10 pages, 7 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref

Anomaly detection in the dynamics of web and social networks

Author: Benzi Kirell
Miz Volodymyr
Ricaud Benjamin
Vandergheynst Pierre
Publication venue
Publication date: 22/01/2019
Field of study

In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm. To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets

Infoscience - École polytechnique fédérale de Lausanne

Dynamic pattern recognition in large-scale graphs with applications to social networks

Author: Miz Volodymyr
Publication venue: Lausanne, EPFL
Publication date: 11/12/2020
Field of study

A graph is a versatile data structure facilitating representation of interactions among objects in various complex systems. Very often these objects have attributes whose measurements change over time, reflecting the dynamics of the system. This general data framework can be used in many fields to represent complex data structures: brain networks and neuronal spikes, web networks and clickstreams, social networks and activity of the users, among others. In all of these examples, the structural and dynamic components of the data are inseparable, which significantly complicates the detection, analysis, and interpretation of patterns that emerge in the networks. The increasing size and complexity of graph-structured data require scalable and interpretable algorithms for dynamic pattern detection in such systems. In this dissertation, we present an unsupervised approach for dynamic pattern detection in large-scale graphs. In this approach, we combine intuitions derived from attention mechanisms, Hopfield networks, and memory networks to build scalable, efficient, and interpretable algorithms. We then demonstrate multiple applications of this approach in recommendation systems, information recovery algorithms, and collective behavior studies. Additionally, we use our algorithm to detect dynamic activity patterns in social and communication networks. We conduct extensive experiments on Wikipedia data, detecting and analyzing patterns in the viewership activity in its web network. To study the collective behavior of Wikipedia readers, we develop an automated pattern interpretation model, which allows for comparison of trending topics across multiple language editions of Wikipedia. The results of the experiments reveal provocative insights into how people interact and search for information in online social networking environments, opening new avenues for future research on collective behavior analysis at a large scale. Finally, we present a distributed data processing framework for Wikipedia server logs that allows others to reproduce all pattern detection experiments presented in this thesis and to conduct similar collective behavior studies on the latest data

Infoscience - École polytechnique fédérale de Lausanne

Spikyball Sampling: Exploring Large Networks via an Inhomogeneous Filtered Diffusion

Author: Aspert Nicolas
Miz Volodymyr
Ricaud Benjamin
Publication venue: 'MDPI AG'
Publication date: 05/11/2020
Field of study

Studying real-world networks such as social networks or web networks is a challenge. These networks often combine a complex, highly connected structure together with a large size. We propose a new approach for large scale networks that is able to automatically sample user-defined relevant parts of a network. Starting from a few selected places in the network and a reduced set of expansion rules, the method adopts a filtered breadth-first search approach, that expands through edges and nodes matching these properties. Moreover, the expansion is performed over a random subset of neighbors at each step to mitigate further the overwhelming number of connections that may exist in large graphs. This carries the image of a “spiky” expansion. We show that this approach generalize previous exploration sampling methods, such as Snowball or Forest Fire and extend them. We demonstrate its ability to capture groups of nodes with high interactions while discarding weakly connected nodes that are often numerous in social networks and may hide important structures

Infoscience - École polytechnique fédérale de Lausanne

A Lie-Group Adaptive Method to Identify the Radiative Coefficients in Parabolic Partial Differential Equations

Author: Benzi Kirell
Miz Volodymyr
Ricaud Benjamin
Vandergheynst Pierre
Publication venue
Publication date: 13/10/2012
Field of study

In this work, we propose a new, fast and scalable method for anomaly detection in large time-evolving graphs. It may be a static graph with dynamic node attributes (e.g. time-series), or a graph evolving in time, such as a temporal network. We define an anomaly as a localized increase in temporal activity in a cluster of nodes. The algorithm is unsupervised. It is able to detect and track anomalous activity in a dynamic network despite the noise from multiple interfering sources. We use the Hopfield network model of memory to combine the graph and time information. We show that anomalies can be spotted with good precision using a memory network. The presented approach is scalable and we provide a distributed implementation of the algorithm.To demonstrate its efficiency, we apply it to two datasets: Enron Email dataset and Wikipedia page views. We show that the anomalous spikes are triggered by the real-world events that impact the network dynamics. Besides, the structure of the clusters and the analysis of the time evolution associated with the detected events reveals interesting facts on how humans interact, exchange and search for information, opening the door to new quantitative studies on collective and social behavior on large and dynamic datasets

Infoscience - École polytechnique fédérale de Lausanne

National Taiwan University Repository

What is Trending on Wikipedia? Capturing Trends and Language Biases Across Wikipedia Editions

Author: Aspert Nicolas
Hanna Joëlle
Miz Volodymyr
Ricaud Benjamin
Vandergheynst Pierre
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/02/2020
Field of study

In this work, we propose an automatic evaluation and comparison of the browsing behavior of Wikipedia readers that can be applied to any language editions of Wikipedia. As an example, we focus on English, French, and Russian languages during the last four months of 2018. The proposed method has three steps. Firstly, it extracts the most trending articles over a chosen period of time. Secondly, it performs a semi-supervised topic extraction and thirdly, it compares topics across languages. The automated processing works with the data that combines Wikipedia's graph of hyperlinks, pageview statistics and summaries of the pages. The results show that people share a common interest and curiosity for entertainment, e.g. movies, music, sports independently of their language. Differences appear in topics related to local events or about cultural particularities. Interactive visualizations showing clusters of trending pages in each language edition are available online https://wiki-insights.epfl.ch/wikitrend

Infoscience - École polytechnique fédérale de Lausanne

arXiv.org e-Print Archive

Crossref

Summary of tutorials at The Web Conference 2021

Author: Albert Javier
Altunina Olesia
Aref Samin
Aspert Nicolas
Avram Tudor Mihai
Baidakova Daria
Benhalloum Amine
Bhagat Smriti
Bian Yatao
Celebi Onur
Chen Jiawei
Cheng Hong
Courdier Evann
Couto Francisco M.
Cvetinovic Dragan
Defferrard Michael
Diesner Jana
Dinh Ly
Drutsa Alexey
Dy Jennifer
Fakhraei Shobeir
Faloutsos Christos
Fan Wenqi
Fan Yicheng
Feng Fuli
Ferng Chun-Sung
Geng Xiubo
Gessert Felix
Goldenberg Dmitri
Gong Ming
Gopalan Arjun
Groth Paul
He Xiangnan
Heydon Allan
Howell Rose
Huang Junzhou
Huang Wenbing
Ioannidis Stratis
Jeunen Olivier
Jiang Daxin
Jose Johny
Juan Da-Cheng
Kenthapadi Krishnaram
Koenig Mario
Laurent Florian
Lisena Pasquale
Lu Chun-Ta
Magalhaes Cesar Ilharco
Merono-Penuela Albert
Mishra Shubhanshu
Miz Volodymyr
Mohanty Sharada
Muller Martin
Packer Ben
Pei Jian
Pham Philip
Popov Nikita
Rezapour Rezvaneh
Ricaud Benjamin
Ritter Norbert
Rohde David
Rong Yu
Sakhi Otmane
Sameki Mehrnoosh
Scheller Christian
Schneider Manuel
Schraner Yanick
Sephus Nashlie
Shou Linjun
Succo Stephan
Sun Fuchun
Tang Jiliang
Teinemaa Irene
Tsinadze Levan
Ustalov Dmitry
Vasile Flavian
Wang Xiang
Wang Yueqi
West Robert
Wingerath Wolfram
Wollmer Benjamin
Xu Tingyang
Yildiz Ilkay
Yin Dawei
Yu George
Zhao Xiangyu
Zhou Xingjie
Zitnik Marinka
Publication venue
Publication date: 01/01/2021
Field of study

Institutional Repository Universiteit Antwerpen